Keep it Simple: Language Acquisition Without Complex Bayesian Models

نویسنده

  • J. B. Tenenbaum
چکیده

The work of Hsu & Chater [1] and Perfors et al. [2] establishes that sophisticated statistical learning techniques known as Hierarchical Bayesian Models (HBMs) can successfully capture certain observed patterns of both underand over-generalization in child language acquisition. This paper shows that a much simpler method, maximum likelihood estimation (MLE), can equal HBM performance. The work in [2] analyzed dative alternations compiled from child-directed CHILDES English or from controlled language experiments (Wonnacott et al., [3]). However, HBMs are ‘ideal’ learning systems, known to be computationally infeasible (Kwisthout et al. [4]). Consequently, as [4] notes, the relevance of HBMs for cognitively plausible accounts of human learning remains uncertain. This paper that combining simple clustering methods along with MLE provides an alternative, more cognitively plausible account of the same facts. It has long been recognized that children manifest subtle patterns of underand over-generalization with respect to learning dative verb alternation frames, using a combination of both verb-particular information as well as general verb-class behavior (Baker, 1979 [5]; Groppen et al, 1991 [6]), e.g., John told the police the story/told the story to the police, but *confessed the police the story. The HBM approach of [2] posits three levels of statistical estimation to capture this observed behavior, from counts of individual verb occurrence in direct object dative frames (DOD), prepositional dative frames (PPD), or alternating (both); to the frequency of frames themselves; to, finally, the hierarchical estimate of whether the alternation frames themselves are distributed uniformly or not. Observed counts in the Childes corpus may then used to estimate whether an unseen verb will be DOD, PPD, or alternating. Is such complexity needed? We re-analyzed the child-directed counts of the frames for 19 verbs (give, say, ..., mail) taken from the CHILDES Adam corpus as in [2], as well as subcat frame counts for these 19 verbs from all of the English CHILDES, approx. 32,000 examples altogether. We tested a total of 18 different non-HBM models, using several clustering methods (the latter implemented in the Weka package [7]). K-means clustering easily placed the verbs into one of 3 groups, while a smoothed maximum likelihood estimate (MLE) using these groups yielded dative frame predictions closely matching the performance of HBMs. Fig. 1 illustrates. Dotted lines show the simplest model’s performance, while solid lines are HBM variants. The y-axis plots the log deviation between MLE and HBM estimates while the x-axis plots # of example instances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian approach for image denoising in MRI

Magnetic Resonance Imaging (MRI) is a notable medical imaging technique that is based on Nuclear Magnetic Resonance (NMR). MRI is a safe imaging method with high contrast between soft tissues, which made it the most popular imaging technique in clinical applications. MR Imagechr('39')s visual quality plays a vital role in medical diagnostics that can be severely corrupted by existing noise duri...

متن کامل

The Interlanguage of Persian Learners of Italian: a Focus on Complex Predicates

This paper aims at investigating the acquisition of Italian complex predicates by native speakers of Persian. Complex predication is not as pervasive a phenomenon in Italian as it is in Persian. Yet Italian native speakers use complex predicates productively; spontaneous data show that Persian learners of Italian seem to be perfectly aware of Italian complex predicates and use this familiar fea...

متن کامل

A Dynamical System Approach to Research in Second Language Acquisition

Epistemologically speaking, second language acquisition research (SLAR) might be reconsidered from a complex dynamical system view with interconnected aspects in the ecosystem of language acquisition. The present paper attempts to introduce the tenets of complex system theory and its application in SLAR. It has been suggested that the present dominant traditions in language acquisition research...

متن کامل

Universal Grammar and Chaos/Complexity Theory: Where Do They Meet And Where Do They Cross?

  Abstract The present study begins by sketching "Chaos/Complexity Theory" (C/CT) and its applica-tion to the nature of language and language acquisition. Then, the theory of "Universal Grammar" (UG) is explicated with an eye to C/CT. Firstly, it is revealed that CCT may or may not be allied with a theory of language acquisition that takes UG as the initial state of language acquisition for ...

متن کامل

Language Acquisition and Probabilistic Models: keeping it simple

Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of underand overgeneralization in child language acquisition. However, as is well known, HBMs are “ideal” learning systems, assuming access to unlimited computational resources that may not be available to child language learners. Consequently, it remains crucial to carefully assess the...

متن کامل

A Bayesian Model of Natural Language Phonology: Generating Alternations from Underlying Forms

A stochastic approach to learning phonology. The model presented captures 7-15% more phonologically plausible underlying forms than a simple majority solution, because it prefers “pure” alternations. It could be useful in cases where an approximate solution is needed, or as a seed for more complex models. A similar process could be involved in some stages of child language acquisition; in parti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014